Skip to content

DRA: Derived Attributes#6081

Merged
k8s-ci-robot merged 8 commits into
kubernetes:masterfrom
gauravkghildiyal:dra-derived-attributes
Jun 15, 2026
Merged

DRA: Derived Attributes#6081
k8s-ci-robot merged 8 commits into
kubernetes:masterfrom
gauravkghildiyal:dra-derived-attributes

Conversation

@gauravkghildiyal

@gauravkghildiyal gauravkghildiyal commented May 15, 2026

Copy link
Copy Markdown
Member
  • One-line PR description: Add KEP-6080 introducing derivedAttributes for flexible DRA device co-allocation.
  • Other comments: This KEP introduces scoped CEL expressions (derivedAttributes) to DeviceRequest to synthesize virtual grouping keys on the fly.

/sig scheduling
/wg device-management

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. labels May 15, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Scheduling May 15, 2026
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 15, 2026
@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko May 15, 2026 00:23
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 15, 2026
@gauravkghildiyal gauravkghildiyal mentioned this pull request May 15, 2026
4 tasks
@gauravkghildiyal gauravkghildiyal force-pushed the dra-derived-attributes branch from 217ed13 to 7d68c53 Compare May 15, 2026 00:33
@gauravkghildiyal gauravkghildiyal force-pushed the dra-derived-attributes branch from 7d68c53 to df713b1 Compare May 15, 2026 00:35
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/kep.yaml Outdated
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md Outdated
@johnbelamaric

Copy link
Copy Markdown
Member

@thameem-abbas I think this might be useful for your use cases.

@johnbelamaric

Copy link
Copy Markdown
Member

cc @pohly

@pravk03

pravk03 commented May 15, 2026

Copy link
Copy Markdown
Contributor

/cc

@k8s-ci-robot k8s-ci-robot requested a review from pravk03 May 15, 2026 03:21

@pohly pohly left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me.

Shadow API review: we need to sort out some details, see comments.

Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md Outdated
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md Outdated
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md Outdated
- **Environment**: The CEL environment for `Expression` is exactly the same
as that for `CELDeviceSelector`, containing a single variable `device`.
- **Return Type**: The CEL expression must evaluate to a scalar (string,
integer, boolean) or a list of scalars (`[]string`, `[]int64`, `[]bool`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "must evaluate to" cannot be validated during admission time (depends on unknown input types).

How are evaluation errors at runtime handled?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I should have clarified this. Missed including it here but my thoughts were to have the same behaviour as CELDeviceSelector. Precisely this

I've update the KEP to that effect.

Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md Outdated
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/kep.yaml
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md
Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md

@yongruilin yongruilin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declarative Validation plans to support validation for new API fields without handwritten validation counterpart directly.
But for complicated validation(e.g. CEL compilable), might still need to reply on handwritten.
/cc @jpbetz

Comment thread keps/sig-scheduling/6080-dra-derived-attributes/README.md
@k8s-ci-robot k8s-ci-robot requested a review from jpbetz May 15, 2026 21:27
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 18, 2026
@kad

kad commented May 18, 2026

Copy link
Copy Markdown
Member

/cc

@k8s-ci-robot k8s-ci-robot requested a review from kad May 18, 2026 14:07
@gauravkghildiyal

Copy link
Copy Markdown
Member Author

We have already progressed on doing it transparently, with a change in the matchAttribute semantics and a special CEL function to simplify that.

Good point. Yes for matchAttribute its simpler because there is no CEL expression involved and the kubernetes scheduler can be made intelligent to be able to 'match' an element with a single-element-list.

I'm referring more to the existing use of pcieRoot attribute within CELExpressions for selection purposes. In particular, this limitation highlighted in the List Attributes KEP

image

In light of this, I'm thinking it may serve our end users better if DRA Drivers don't try to migrate their single-element to list-type and instead introduce a different attribute for lists when needed.

@thameem-abbas

Copy link
Copy Markdown

I like the idea of different APIs if that's just made clear now before more DRA drivers standardize and it becomes harder

@johnbelamaric

Copy link
Copy Markdown
Member

Two different attributes force users to know which one each driver publishes. The same attribute avoids that.

@gauravkghildiyal

gauravkghildiyal commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Two different attributes force users to know which one each driver publishes. The same attribute avoids that.

But as things exist today in KEP-5491-dra-list-types-for-attributes, we don't yet have a solution which covers all backward compatibility issues and the migration of single scalar to a list type isn't transparent. So the cost of migrating an attribute from scalar to a list is:

  • Potential of breaking existing CEL expressions that expect scalar value
  • While it simplifies the naming (by keeping the same name), it forces users to use specialized CEL functions to handle the overloaded type, which may negate the UX gains.

These are all great discussions to have, but is it fair to say a solution to this problem is better discussed in the scope of KEP-5491 and is independent of this KEP? CC: @everpeace

@ffromani

ffromani commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Two different attributes force users to know which one each driver publishes. The same attribute avoids that.

But as things exist today in KEP-5491-dra-list-types-for-attributes, we don't yet have a solution which covers all backward compatibility issues and the migration of single scalar to a list type isn't transparent. So the cost of migrating an attribute from scalar to a list is:

* Potential of breaking existing CEL expressions that expect scalar value

* While it simplifies the naming (by keeping the same name), it forces users to use specialized CEL functions to handle the overloaded type, which may negate the UX gains.

These are all great discussions to have, but is it fair to say a solution to this problem is better discussed in the scope of KEP-5491 and is independent of this KEP? CC: @everpeace

+1. There's very active work about adding the pcieRoot attribute support to the dra-driver-cpu which we aim to ship in 0.2.0 and we could use some guidance, but I don't want to derail the KEP review. Perhaps a topic for the WG-device-management meeting?

@ania-borowiec

Copy link
Copy Markdown
Contributor

/cc @ania-borowiec

@dom4ha

dom4ha commented Jun 10, 2026

Copy link
Copy Markdown
Member

There are no concerns regarding this feature from the scheduler perspective. The time overhead is linear, so fully acceptable. Leaving the usability aspects on wg-device-management to decide.
/approve

1 similar comment
@dom4ha

dom4ha commented Jun 10, 2026

Copy link
Copy Markdown
Member

There are no concerns regarding this feature from the scheduler perspective. The time overhead is linear, so fully acceptable. Leaving the usability aspects on wg-device-management to decide.
/approve

@aojea

aojea commented Jun 10, 2026

Copy link
Copy Markdown
Member

/lgtm (sig network)

networking community is using DRA as extension mechanism, but is unrealistic for networking to agree in all the possible standard attributes in such fragmented space. This allow us to provide a simple mechanism for iteration

@gauravkghildiyal

Copy link
Copy Markdown
Member Author

Thanks for the reviews @dom4ha and @aojea

@johnbelamaric and @pohly -- Can one of you give the final approvals (including PRR)? Thanks

@pohly pohly left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, a suggestion for integration/performance testing, otherwise almost there as far as I am concerned.

// +featureGate=DRADerivedAttributes
// +listType=map
// +listMapKey=name
// +k8s:optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I know I suggested +k8s:optional as replacement for +optional earlier, but since then I learned that both are needed (the former for declarative validation, the latter for OpenAPI).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Updated with +k8s:optional

// - attributes (map[string]object): the device's attributes, grouped by prefix
// (e.g. device.attributes["dra.example.com"] evaluates to an object with all
// of the attributes which were prefixed by "dra.example.com").
// - capacity (map[string]object): the device's capacities, grouped by prefix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's just say "which carries the same properties as in a CELDeviceSelector`?

For example, allowMultipleAllocations probably will be available (why should it not be?), but isn't described here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated the statement to:

	// The expression's input is an object named "device", which carries the
	// same properties as in a CELDeviceSelector.

- Verify correct CEL evaluation and constraint matching.

##### Integration tests
- `test/integration/scheduler/dra`:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this in test/integration/scheduler/dra, do this in test/integration/scheduler_perf/dra/derived-attributes (new folder). Make sure to include a realistic scenario (number of attributes, complexity of the CEL expressions), then define a simple case for correctness checking and larger cases for performance measurement.

See consumable capacity (currently being updated in kubernetes/kubernetes#139511 because we didn't do that well initially).

With that done I don't see a need to add more testcases for this to test/integration/scheduler/dra.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated the directory and things to keep in mind during implementation.

#### Beta
- Gather feedback from DRA driver maintainers (SIG Node / SIG Network).
- Any additional e2e tests implemented and running in Testgrid canaries.
- Verify scheduler performance and latency overhead with large device counts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you'll need the scheduler_perf cases for.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated a reference to the scheduler_perf test discussed in the other comment.

@johnbelamaric johnbelamaric left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor questions/comments but PRR generally looks good. Please respond to these and then I can give approval

- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: DRADerivedAttributes
- Components depending on the feature gate: kube-apiserver,
kube-controller-manager, kube-scheduler, kubelet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why kubelet?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. You did ask me to clean this up from the PRR but I missed cleaning it up here. Removed now.

###### Are there any tests for feature enablement/disablement?

Yes. Unit tests in `pkg/apis/resource/validation` verify that
`derivedAttributes` and non-FQDN `matchAttribute` strings are rejected when the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cover distinctAttribute

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly worded distinctAttributes too.

2. When evaluating candidate devices for a request, the plugin executes the
cached CEL expressions against each `Device` object. The computed values are
stored in a temporary map associated with the candidate device.
3. When evaluating `matchAttribute` constraints during the plugin

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should apply to anything in constraints, not just matchAttribute. Today that means also distinctAttribute.

Also: can derived attributes be used in device request CEL selectors? ie, are they evaluated BEFORE or AFTER the selectors?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. In CEL expressions, the derived attributes aren't really needed (the expression itself can embed the necessary mapping). But it would be more consistent and perhaps even expected if they were present. I'm +1 for that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnbelamaric and @pohly! The use of derivedAttributes within CELDeviceSelector is something I've thought about, but intentionally did not include in this KEP.

My main reason for keeping it out of scope—as you both noted—is that it isn't strictly necessary. If a user needs that logic in the selector, the CELDeviceSelector can just embed the necessary mapping directly. Attempting to include derivedAttributes into selectors brings additional complexities, such as how they impact performance by forcing early evaluation, and how they should be referenced in the CEL environment (i.e. whether we add them to device.attributes or have to introduce a new device.derivedAttributes map)

Perhaps earlier in the KEP cycle we could have had these discussions in detail, but seeing as we're close to the KEP deadline, I'd rather avoid introducing this change without more careful consideration. If this usability experience is needed in the future, a different KEP would be a perfect opportunity to explore that avenue. Hope you both share the same thoughts :)

With that in mind, I've added this bit of clarification:

- **Evaluation Order**: Derived attributes are evaluated **after** the device
  request's `CELDeviceSelector` has filtered candidate devices. They are not
  injected into the selector's CEL environment and are exclusively used for
  evaluating constraints like `matchAttribute` and `distinctAttribute`.

and also added a separate section ## Future Considerations > ### Derived Attributes in Device Selectors on the current rationale and future improvement through a separate enhancement (if needed)

@gauravkghildiyal gauravkghildiyal force-pushed the dra-derived-attributes branch from 4a206eb to 73279ea Compare June 15, 2026 22:06
@johnbelamaric

Copy link
Copy Markdown
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dom4ha, gauravkghildiyal, johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2026
@k8s-ci-robot k8s-ci-robot merged commit ece901a into kubernetes:master Jun 15, 2026
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.37 milestone Jun 15, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in @liggitt Jun 15, 2026
@github-project-automation github-project-automation Bot moved this from Needs Triage to Done in SIG Scheduling Jun 15, 2026
@pohly pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: Done
Archived in project
Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.